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.in . 1 INTRODUCTION 



ABSTRACT 

A Bayesian multi-planet Kepler periodogram has been developed for the analysis 
of precision radial velocity data (Gregory 2005b and 2007). The periodogram em- 
ploys a parallel tem pering Markov chain Monte Carlo algorithm. The HD 11964 data 
(|Butler et alJ l2006) has been re-analyzed using 1, 2, 3 and 4 planet models. Assuming 
that all the models are equally probable a priori, the three planet model is found to 
be ^ 600 times more probable than the next most probable model which is a two 
planet model. The most probable model exhibits three periods of 38.02 "" 
and 1924±t| d, and eccentricities of 0.22±g;^, 0.63±g$ and 0.05 
Assuming the three signals (each one consistent with a Keplerian orbit) are caused by 
planets, the corresponding limits on planetary mass (Msinz) and semi-major axis are 

an), 
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respectively. The small difference (1.3<r) between the 360 day period and one year 
suggests that it might be worth investigating the barycentric correction for the HD 
11964 data. 

Key words: Extrasolar planets, Bayesian methods, model selection, time series anal- 
ysis, periodogram, HD 11964. 



Improvements in precision radial velocity measurements and 
continued monitoring are permitting the detection of lower 
amplitude planetary signatures. One example of the fruits of 
this work is the detection of a super earth i n the habital zone 
surrounding Gliese 581 (|Urdv et al.l 120071 ). This and other 
remarkable successes on the part of the observers is moti- 
vating a significant effort to improve the statistical tools 
for a nalyzing radial velo city data (e.g., Ford fc Gregory! 
20061. Ford 2005 fc 2006 iGregoryj l2005bl . ICummind [200T 
,[Loi 



Loredo fc Chernofllliool . lLoredol 12004 ). Much of the recent 
work has highlighted a Bayesian MCMC approach as a way 
to better understand parameter uncertainties and degenera- 
cies and to compute model probabilities. 

Gregory (2005a, b & c and 2007) presented a Bayesian 
MCMC algorithm that makes use of parallel tempering to 
efficiently explore a large model parameter space starting 
from a random location. It is able to identify any signif- 
icant periodic signal component in the data that satisfies 



Kepler's laws and thus functions as a Kepler periodogram 0. 
This eliminates the need for a separate periodogram search 
for trial orbital periods which typically assume a sinusoidal 
model for the signal that is only correct for a circular or- 
bit. In addition, the Bayesian MCMC algorithm provides 
full marginal parameters distributions for all the orbital ele- 
ments that can be determined from radial velocity data. The 
samples from the parallel chains can also be used to compute 
the marginal likelihood for a given model (|Gregorvll2005ah 
for use in computing the Bayes factor that is needed to com- 
pare models with different numbers of planets. The parallel 
tempering MCMC algorithm employed in this work includes 
an innovative two stage adaptive control system that auto- 
mates the selection of efficient Gaussian parameter proposal 
distributions. The annealing of the proposal distributions 
carried out by the control system combined with parallel 
tempering makes it practical to attempt a blind search for 
multiple planets simultaneously. This was done for the anal- 
ysis of the current data set and for th e analysis of the HD 
208487 reported earlier i|Gregorvll2007l ). 

This paper presents a Bayesian re-analysis of the exist- 
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1 Following on from the pion eering work on Bayesian peri- 
odograms bv ljavnesl [1198711 and iBretthorstl | | 1988ft 
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ing 87 precisi on radial velocity m easurements for HD 11964 
published by iButler et alj l|2006l ). who reported the detec- 
tion of a single planet with a period of 2110 ± 270d after 
removing a trend in the data. They remark that the 5.3m 
s _1 residuals are comparable to the 9m s _1 amplitude, plac- 
ing the exoplanetary interpretation of the velocity variations 
somewhat in doubt. 



2 ANALYSIS 

The analysis of the HD 11964 data employed exactly 
the same Bayesian multi-planet Kepler periodogram 
that wa s previously d escribed for the analysis of HD 
208487 l|Gregorvll2007l ). The periodogram utilizes a par- 
allel tempering Markov chain Monte Carlo algorithm 
which yields the probability density distribution for each 
model parameter and permits a direct comparison of 
the probabilities of models with differing numbers of 
planets. In parallel tempering each chain corresponds to 
a different temperature. We parameterize the tempera- 
ture by its reciprocal, (3 = 1/T which varies from zero 
to 1. For parameter estimation purposes 12 chains (/3 = 
{0.05, 0.1, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.70, 0.80, 0.90, 1.0}) 
were employed and the final samples drawn from the (3 = 1 
chain, which corresponds to the desired target probability 
distribution. For /3< 1, the distribution is much flatter. 

At intervals, a pair of adjacent chains on the tempering 
ladder are chosen at random and a proposal made to swap 
their parameter states. The mean number of iterations be- 
tween swap proposals was set = 8. A Monte Carlo accep- 
tance rule determines the probability for the proposed swap 
to occur. This swap allows for an exchange of information 
across the population of parallel simulations. In the higher 
temperature simulations, radically different configurations 
can arise, whereas in higher (3 (lower temperature) states, a 
configuration is given the chance to refine itself. 

The samples from hotter simulations were also used 
to evaluate the marginal (global) likeli hood needed for 
mod el selecti o n, fol lowing Section 12.7 of iGregorvl (1200581 ) 
and IGregorvl i|2007l ). This is discussed more in Section 2] 
Marginal likelihoods estimated in this way require many 
more parallel simulations. For HD 11964, 40 (3 levels were 
used spanning the range f3 = 10 -8 to 1.0 with a mean inter- 
val between swaps = 3. 

For a one planet model the predicted radial velocity is 
given by 

v{U) = V + K[cob{6(U + xP) + +ecosw], (1) 

and involves the 6 unknown parameters 

V = a constant velocity. 

K = velocity semi-amplitude. 

P = the orbital period. 

e = the orbital eccentricity. 

u = the longitude of periastron. 

X = the fraction of an orbit, prior to the start of data 
taking, that periastron occurred at. Thus, xP — the number 
of days prior to ti = that the star was at periastron, for 
an orbital period of P days. 

9(ti + xP) — the angle of the star in its orbit relative 
to periastron at time ti, also called the true anomaly. 



We utilize this form of the equation because we obtain 
the dependence of 9 on ti by solving the conservation of 
angular momentum equation 

d9 _ 2ir{l + eco S 8(t l + X P)} 2 . , 

dt p(i_ e 2)3/2 u - I > 

Our algorithm is implemented in Mathematica and it proves 
faster for Mathematica to solve this differential equation 
than solve the equations relating the true anomaly to the 
mean anomaly via the eccentric anomaly. Mathematica gen- 
erates an accurate interpolating function between t and 9 
so the differential equation does not need to be solved sepa- 
rately for each ti. Evaluating the interpolating function for 
each ti is very fast compared to solving the differential equa- 
tion, so the algorithm should be able to handle much larger 
samples of radial velocity data than those currently available 
without a significant increase in co mputational tim e. 

As described in more detail in IGregorvl 120071 . we em- 
ployed a re-parameterization of x anc ' u to improve the 
MCMC convergence speed motivated by the work of Ford 
(2006). The two new parameters are tp = 2irx + w and 
4> = 2nx — uj. ip is well determined for all eccentricities. 
Although <p is not well determined for low ecce ntricities, 
it is at least orthogonal to the tp parameter. In IGregorvl 
I2007I . we recommended a uniform prior for tp in the inter- 
val to 2n and uniform prior for <p in the interval — 2n 
to +2n, which is the smallest rectangle in (tp, <p) that uni- 
formly samples the full range in (x, uj)- However, a posterior 
that is a wraparound continuous in (Xi w ) does not map 
into a wraparound continuous distribution in this rectan- 
gle of (tp,<p). This can reduce the algorithms performance 
and convergence. A simple fix is to double the range of tp 
to (0 < tp < 47r). The big (ip,<p) square holds two copies of 
the probability patch in (Xi w ) which doesn't matter. What 
matters is that the posterior is now wraparound continuous 
in (tp,(p). 

In a Bayesian analysis we need to specify a suitable prior 
for each parameter. These are tabulated in TableQ] Detailed 
argum ents for the choice of each prior were given in lGregorvl 
120071 . The lower bound on the search period of l.Old was 
em ployed to avoid a possible 1 day sampling artifact. 

lGregory|2007l discussed two different strategies to search 
the orbital frequency parameter space for a multi-planet 
model: (a) an upper bound on /1 ^ /2 is uti- 

lized to maintain the identity of the frequencies, and (b) all 
fi are allowed to roam over the entire frequency range and 
the parameters re-labeled afterwards. Case (b) was found to 
be significantly more successful at converging on the high- 
est posterior probability peak in fewer iterations during re- 
peated blind frequency searches. In addition, case (b) more 
easily permits the identification of two planets in 1:1 reso- 
nant orbits. We also adopted approach (b) in the current 
analysis. 

All of the models considered in this paper incorporate 
an extra noise parameter, s, that can allow for any addi- 
tional noise beyond the known measurement uncertainties^]. 
We assume the noise variance is finite and adopt a Gaussian 
distribution with a variance s 2 . Thus, the combination of 

2 In the absence of detailed knowledge of the sampling distribu- 
tion for the extra noise, we pick a Gaussian because, for any given 
finite noise variance, it is the distribution with the largest uncer- 
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Table 1. Prior parameter probability distributions. 



Parameter 



Prior 



Lower bound Upper bound 



Orbital frequency 

Velocity Ki 
(m s" 1 ) 

V (m s" 1 ) 

ej Eccentricity 



p(ln h , In f 2 , ■ ■ ■ In /„|M«, I) = ^TFmW' 
(n =number of planets) 

Modified Jeffreys^, 

(K+Kq)- 1 



1+ w (*fry»^. 



^max 

Uniform 



1/1.01 d 1/1000 yr 



(K = l) ^max (%^) 1/3 ^= 



iv max = 2129 

ax 

o 1 



o;i Longitude of 
pcriastron 

s Extra noise (m s _1 ) 
(m s- 1 ) 



Uniform 



In 1+- 



o 



2- 



(s = 1) K n 



a Since the prior lower limits for K and s include zero, we used a modified Jeffreys prior of the form 

p(X\M,I) = — i- , X — 7 (3) 

X + X m (i + ^) 

For X -C Xq, p(X\M, I) behaves like a uniform prior and for X ^> Xq it behaves like a Jeffreys prior. The 



In (l - 



jp° x j term in the denominator ensures that the prior is normalized in the interval to X n 



the known errors and extra noise has a Gaussian distribution 
with variance = of +s 2 , where Oi is the standard deviation of 
the known noise for i th data point. For example, suppose that 
the star actually has two planets, and the model assumes 
only one is present. In regard to the single planet model, the 
velocity variations induced by the unknown second planet 
acts like an additional unknown noise term. Other factors 
like star spots and chromospheric activity can also con- 
tribute to this extra velocity noise term which is often re- 
ferred to as stellar jitter. Several researchers have attempted 
to estimate stellar jitter for individual st ars based on sta- 
tistical correlations wi t h observables (e.g.. lSaar fc Donahue! 
Il997l . ISaar et al.Hl998l . IWrightJ liooBT) . In general, nature is 
more complicated than our model and known noise terms. 
Marginalizing s has the desirable effect of treating anything 
in the data that can't be explained by the model and known 
measurement errors as noise, leading to conservative esti- 
mates of orbital parameters (see Sections 9.2.3 and 9.2.4 of 
iGregorvl |2005al ) for a tutorial demonstration of this point). 
If there is no extra noise then the posterior probability dis- 
tribution for s will peak at s — 0. The upper limit on s was 
set equal to K max . We employed a modified Jeffrey's prior 
for s with a knee, so = lm s _1 . 



tainty as me asured by the entropy, i.e., the maximum entropy 
distribution j JavnesHl957l , lGregonl2005al section 8.7.4.) 



3 RESULTS 

Figure [T] s hows the precision radial velocity data for HD 
11964 from lButler et akl (2006) who reported a single planet 
with M sin i = 0.61 ± 0.10 in a 2110 ± 270 day orbit with an 
eccentricity of 0.06 ±0.17. Panels (b) and (c) show our best 
fitting three planet light curve and residuals. 

If we assume that all the models considered are equally 
probable a priori, then as shown in Section 3] the three 
planet model is ^ 600 times more probable than the next 
most probable model which is a two planet model. In this 
section, we mainly focus on the MCMC results for the three 
planet model. 

Figure [2] shows post burn-in MCMC iterations for the 
parameters of a three planet model returned by the Ke- 
pler periodogram, starting from an initial location (Pi = 
10, P2 = 500 &i P3 — 2300d) in period parameter space. 
Similar results were obtained with other different start- 
ing positions. A total 10 6 iterations were used with ev- 
ery tenth iteration stored. For display purposes only every 
hundredth stored point is plotted in the figure. The upper 
left panel is a plot of Logio (prior x likelihood). The next 
two panels of the top row shows the extra noise parame- 
ter s and the constant velocity parameter V, respectively. 
The remaining panels show the orbital parameters for each 
of the three periods. The equilibrium solution corresponds 
to Pi = 38d, P 2 = 360d, & P 3 = 1924d (for compari- 
son, the two planet model MCMC yielded two solutions of 
Pi = 360 & P 2 = 1990d and P 2 = 38 & P 3 = 1932d, respec- 
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Table 2. Three planet model parameter estimates. 
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Figure 1. The data is shown in panel (a) and the best fitting 
three planet (Pi = 38, P 2 = 357 , & P 3 = 1928 days) model 
versus time is shown in (b). Panel (c) shows the residuals. 



tively) . All the traces appear to have achieved an equilibrium 
distribution. 

The Xi and LUi traces were derived from the correspond- 
ing ipi, 4>i traces. The ipi, 4>% traces are not shown. A correla- 
tion is clearly evident between P2 and e2 which is best seen 
in the joint marginals plotted in Figure [4] Each dot is the 
result fr om one iteration . 

The lGelman-Rubinl 0992) statistic is typically used to 
test for convergence of the parameter distributions. In paral- 
lel tempering MCMC, new widely separated parameter val- 
ues are passed up the line to the (3=1 simulation and 
are occasionally accepted. Roughly every 100 iterations the 
= 1 simulation accepts a swap proposal from its neigh- 
boring simulation. The final /3 — 1 simulation is thus an 
average of a very large number of independent /3 = 1 sim- 
ulations. What we have done is divide the /3 = 1 iterations 
into ten equal time intervals and inter-compared the ten dif- 
ferent essentially independent average distributions for each 
parameter using a Gelman- Rubin test. For all of the three 
planet model parameters the Gelman-Rubin statistic was 
< 1.03. 



Parameter 


planet 1 


planet 2 


planet 3 


P(d) 


38.02t°;® 
(38.07) 


36011 
(357) 


19251H 
(1928) 


K (m s" 1 ) 


*'°-0.7 

(4.8) 


6 l +3 '° 
(5.4) 


7 +o.8 
a - ' -0.8 
(10.0) 


e 


0.231;*° 
(0.31) 


°- 63 -'.13 

(0.63) 


o.o5l;° 3 5 

(0.09) 


u (deg) 


123+ 41 
lzo -48 

(111) 


103^34 

(107) 


195l 8 7 » 
(205) 


a (au) 


fl 9K97+-0085 
u.zoz<_ 0085 

(0.253) 


-, -|o 9 +.039 
±.10 -.0 3g 
(1.124) 


3.46l;« 
(3.46) 


M sini (Mj) 


0.090+;™ 
(0.098) 


(0.191) 


77+ -08 
uw '-.08 

(0.795) 


Pcriastron 
passage 

(JD - 2,440,000) 


12737^3 
(12736) 


12397^3! 
(12421) 


10535ll? 4 
(10564) 


s (m s _1 ) 


4.9+J 
(4.7) 


3.7+- 4 4 
(3.3) 


2.41-1 
(1.9) 



Figure [3] shows the individual parameter marginal dis- 
tributions for the three planet model. Table [2] gives our 
Bayesian three planet orbital parameter values and their 
errors. The parameter value listed is the median of the 
marginal probability distribution for the parameter in ques- 
tion and the error bars identify the boundaries of the 68.3% 
credible region. The value immediately below in parenthesis 
is t he maximum a poste riori (MAP) value determined using 
the lNelder-Meadl (|l965l ) downhill simplex method. The val- 
ues derived for the semi-major axis and M sin i, and their er- 
rors, are based on the assume d mass of the s t ar = 1 .49±0.15 
(|Valenti fc Fischerll2005ft . iButler et al.l <l200tj) assumed 



1.12 Mq but also quote Valenti fc Fische"rl 



-i 

a mas s of 

|2005l ) as the reference. The last row gives the Bayesian es- 
timate of the extra noise parameter (stellar jitter) for each 
model. 

In Figure [5] panel (a) shows the data, with the best 
fitting P2 and P3 orbits subtracted, for two cycles of Pi phase 
with the best fitting Pi orbit overlaid. Panel (b) shows the 
data plotted versus P 2 phase with the best fitting Pi and P3 
orbits removed. Panel (c) shows the data plotted versus P3 
phase with the best fitting Pi and P 2 orbits removed. 



4 MODEL SELECTION 

To compare the posterior probabilities of the i th planet 
model to the one planet models we need to evaluate the 
odds ratio, On = p(h"h\D, I)/p(K'h\D, I), the ratio of the 
posterior probability of model Mi to model Mi . Application 
of Bayes's theorem leads to, 



Oil 



p(Mj\I) p(D\Mj,I) = p(Mj\I) 
p(Mi|7) p{D\hhJ) ~ p(Mi|J) 



(4) 
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Figure 2. MCMC post burn-in parameter iterations for a three planet model. The upper left panel is a plot of Logio (prior X likelihood). 



where the first factor is the prior odds ratio, and the sec- 
ond factor is called the Bayes factor. The Bayes factor is 
the ratio of the marginal (global) likelihoods of the mod- 
els. The MCMC algorithm produces samples which are in 
proportion to the posterior probability distribution which 
is fine for parameter estimation but one needs the propor- 
tionalit y constant fo r estimating the model marginal like- 
lihood. IClvdel (|2006l ) recently reviewed the state of tech- 
niques for model select ion from a statistics perspective and 
iFord fc Gregorvl (|2006l ) have evaluated the performance of 



a variety of marginal likelihood estimators in the extrasolar 
planet context. 



In this work we will compare the results from three 
marginal likelihood estimators: (a) parallel tempering, (b) 
ratio estimator, and (c) restricted Monte Carlo. A brief out- 
line of each method is presented in Sections l4.lll4.2l and !4.3l 
The results are summarized in Section T4.4I 
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Figure 3. Marginal parameter probability distributions for the three planet model. 
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Figure 4. A selection of joint marginal parameter probability 
distributions for the three planet model. 



4.1 Parallel tempering estimator 

The MCMC samples from all (rip) simulations can be used 
to calculate the marg i nal like lihood of a model according to 
equation ([5} iGregorvl (|2005aT l. 



\n\p{D\Mi,I)] 



dp{\n[p{D\M u X,I)])p, 



(5) 



where i = 0, 1, 2, 3, 4 corresponds to the number of planets, 
and X represent a vector of the model parameters which 
includes the extra Gaussian noise parameter s. In words, for 
each of the np parallel simulations, compute the expectation 
value (average) of the natural logarithm of the likelihood 
for post burn-in MCMC samples. It is necessary to use a 
sufficient number of tempering levels that we can estimate 
the above integral by interpolating values of 

{]n\p(D\Mi,X,I)])p = I Vlnb(X>|M 4 ,X I)]p, (6) 
t 

in the interval from j3 = to 1, from the finite set. For this 
problem we used 40 tempering levels in the range P = 10 -8 
to 1.0. Figure [U] shows a plot of (hi[p(D\Mi, X, I)\)p versus 
P. The inset shows a blow-up of the range P = 0.1 to 1.0. 

The relative importance of different decades of p can 
be judged from Table [3] The second column gives the frac- 
tional error that would result if this decade of /3 was not 
included and thus indicates the sensitivity of the result to 
that decade. The fractional error falls rapidly with each 
decade and for the lowest decade explored in this run, 
P = 10~ 8 — 10~ r , reaches 0.21. From Figure[6] it is apparent 
that the steep drop in the curve that occurs below (3 = 10~ 6 
shows a significant change in curvature in the direction of 
leveling off similar to t hat experienced in the case of HD 
20878 4 (| Gregory! |2007h and HD 188133 (|Ford fc Gregory! 
120061 ). In the case of HD 208487 (2 planet model), the frac- 
tional error reached 0.16 in the range P = 10~ 6 - 10~ 7 and 
the error fell to 0.02 in the next decade. For HD 188133 (one 
planet model), the fractional error reached 0.26 in the range 
/3 = 10~ 5 — 10~ 6 and the contribution to the fractional error 
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Figure 5. Panel (a) shows the data, with the best fitting P2 and 
P3 orbits subtracted, for two cycles of Pi phase with the best 
fitting Pi orbit overlaid. Panel (b) shows the data plotted versus 
P2 phase with the best fitting Pi and P3 orbits removed. Panel 
(c) shows the data plotted versus P3 phase with the best fitting 
Pi and P2 orbits removed. 
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Figure 6. A plot of (ln\p(D\M s , X, I)]) p versus /3 for the three 
planet model. The inset shows a blow-up of the range j3 = 0.1 to 
1.0. 
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Table 3. Fractional error versus /3 for the three planet model 
results shown in Figure [6] 



j3 ranj 






Fractional error 


1.0 - 


io- 1 




9.59 x 10 101 


lO" 1 


- 10" 


-2 


4.55 x 10 13 


10~ 2 


- io- 


-3 


225 


10~ 3 


- io- 


-4 


3.46 


io- 4 


- io- 


-5 


1.32 


10- s 


- io- 


-6 


0.77 


10~ 6 


- 10" 


•7 


0.51 


IO" 7 


- io- 


-8 


0.21 



for the next 4 decades was 0.14. Based on these comparisons 
we estimate that ignoring lower decades of f3 will result in a 
systematic underestimate of p(D\Ms, I) of ~ 15%. A similar 
table for the two planet PT results gave the fractional error 
for the lowest decade at 0.11. 



4.2 Marginal likelihood ratio estimator 

Our s econd method was introduced by iFord fc Gregory! 
(2006). It makes use of an additional sampling distribution 
h(X). Our starting point is Bayes' theorem 



p(X\M h I) 



p(X\Mi,I)p(D\Mi,X,I) 
p(D\Mi,I) 



(7) 



Re-arranging the terms and multiplying both sides by h(X) 
we obtain 

p(D\Mi,I)p(X\Mi,I)h(X) = 

p(X\Mi,I)p(D\Mi,X,I)h(X). (8) 

Integrate both sides over the prior range for X. 



p(D\Mi,I) RE J p(X\Mi,I)h(X)dX = 

p(X\Mi,I)p(D\Mi, X, I)h{X)dX. (9) 



The ratio estimator of the marginal likelihood, which we 
designate by p(D\Mi, I)re, is given by 

f p(X\Mi,I)p(D\Mi,X,I)h(X)dX 

p(DMi,I) RB = JPy ' t Py ' J - ' . (10) 

fp(X\Mi,I)h(X)dX 

To obtain the marginal likelihood ratio estimator, 
p(D\Mi, I)rb, we approximate the numerator by drawing 
samples X 1 , X 2 , ■ ■ ■ , X" 3 from h(X) and approximate the 
denominator by drawing samples X 1 , X 2 , ■ ■ ■ , X ns from the 
/3 = 1 MCMC post burn-in iterations. 

mm - I] '- itsvg) ■ 

The arbitrary function h(X) was set equal to a multivariate 
normal distribution (multinormal) with a covariance matrix 



3 Initially proposed by J. Berger, at an Exoplanet Workshop 
sponsored by the Statistical and Applied Mathematical Sciences 
Institute in Jan. 2006 



equal to twice the covariance matrix computed from a sam- 
ple of the P = 1 MCMC output. We usedQ n' a = 10 5 and 
n s from 10 4 to 2 x 10 5 . Some of the samples from a multi- 
normal h(X) can have non physical parameter values (e.g. 
K < 0). Rejecting all non physical samples corresponds to 
sampling from a truncated multinormal. The factor required 
to normalize the truncated multinormal is just the ratio of 
the total number of samples from the full multinormal to the 
number of physical valid samples. Of course we need to use 
the same truncated multinormal in the denominator of equa- 
tion (|10|) so the normalization factor cancels. p(D\M2, I)re 
converges much m o re rap idly than the parallel tempering 
estimator ICregorvl (|2007f ) and the parallel tempering esti- 
mator, p(D\ M2, I)pt, required 40 j3 simulations instead of 



4-2.1 Mixture model 

It is clear that a single multinormal distribution can not be 
expected to do a very good job of representing the corre- 
lation between the parameters that is evident b etwee n P2 
and e2 in Figure [4] Following IFord fc Gregory! (120061 ) , we 
improve over the single multinormal by using a mixture of 
multivariate normals by setting 



M*)=-J-y>(*) 

3=1 



(12) 



where we must determine a covariance matrix for each 
hj(X) using the posterior sample. We choose each mix- 
ture component to be a multivariate normal distribution, 
hj(X) — N(X\Xj, Ej), where we must determine a covari- 
ance matrix for each hj using the posterior sample. First, we 
compute p, defined to be a vector of the sample standard de- 
viations for each of the components of X , using the posterior 
sample. Next, define the distance between the posterior sam- 
ple Xi and the center of hj {X), d 2 j = J^ fe (Xki — Xkj) /pi, 
where k indicates the element of X and p. Now draw another 
random subset of 100n c samples from the original posterior 
sample (without replacement), select the 100 posterior sam- 
ples closest to each mixture component and use them to 
calculate the covariance matrix, Sj, for each mixture com- 
ponent. Since the posterior sample is assumed to have fully 
explored the posterior, h(X) should be quite similar to the 
posterior in all regions of significant probability, provided 
that we use enough mixture components. 

4.3 Restricted Monte Carlo marginal likelihood 
estimate 

We can also make use of Monte Carlo integration to evaluate 
the marginal likelihood as given by equation fl 13 p . 



p(D\Mi,I) = / p(X|Mi,J)jj(£>|Mr,X,/)dX 



(13) 



Monte Carlo (MC) integration can be very inefficient in ex- 
ploring the whole prior parameter range, but once we have 
established the significant regions of parameter space with 



4 According to I Ford "fc Gregorvl [2006), the numerator converges 
more rapidly than the denominator. 
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the MCMC results, this is no longer the case. The outer bor- 
ders of the MCMC marginal parameter distributions were 
used to delineate the boundaries of the volume of parameter 
space to be used in the Monte Carlo integration. RMC inte- 
gration was carried out for models Mi, M2, and M3 based 
on 4 x 10 6 samples and repeated three times. 

4.4 Summary of model selection results 

Table |4] summarizes the marginal likelihoods and Bayes fac- 
tors comparing models Mo, M2, M3, M4 to Mi. For model 
Mo, the marginal likelihood was obtained by numerical inte- 
gration. For Mi, the value and error estimate are based on 
the RMC method discussed in Section \4. 31 the RE method 
(1 mixture component), discussed in Section 14.21 and the 
RE method (100 mixture components). Each method was 
repeated 3 times on the same posterior sample to ascertain 
the variance of repeated trials. The quoted uncertainty is 
the standard deviation of the repeats. The sample error of 
the mean is a factor of 1 /\/3 smaller. Since all three methods 
yield approximate marginal likelihoods it is not clear which 
is the most accurate but we are inclined to favor the RE 
method with 100 mixture components. All three estimates 
agree within 15%. 

For model M2 , two peaks in the joint posterior probabil- 
ity distribution were detected: (A) Pi = 362d, P 2 = 1984d, 
and (B) Pi = 37.98d, P 2 = 1897d. The contribution to 
the marginal likelihood from each peak was estimated from 
the posterior samples from the /3 = 1 chain after filter- 
ing the posterior samples in Pi and P 2 to exclude samples 
from the other peak. The ratio estimator method was em- 
ployed with three different mixture components 1, 100, & 
500. For each peak, the results agreed well within a fac- 
tor of better than 2. The 100 and 500 mixture compo- 
nents agreed more closely and appeared to be systemati- 
cally lower than for the 1 component version. On the ba- 
sis of these results, peak A is a factor of ~ 10 more prob- 
able. Combining the RE 500 component results for both 
peaks yields a p{D\Mz,I) = 2.3 x 10 -124 which is close to 
the 3.0 x 10~ 124 value derived from the parallel tempering 
method which was discussed in Section 14.11 Our final esti- 
mate is (2.5 ±0.5) x 10" 124 . 

The marginal likelihood for M3 was estimated from the 
posterior samples from the /3 = 1 chain using the ratio es- 
timator (RE), for 1, 100, and 500 mixture components, by 
RMC integration, and by the PT method which makes use 
of the samples from 40 tempering chains. The results for 
the 100 and 500 mixture components are in good agreement 
and are a factor of ~ 2 less than the one component RE 
results. It is to be expected that the multiple mixture com- 
ponent versions will do a better job of modeling correlated 
parameters than a single component model. 

Figure [7] shows the behavior of the PT marginal likeli- 
hood estimator when it is computed using different numbers 
of iterations taken from a particular run. The results for the 
three planet model, using three such MCMC runs, are shown 
by the thin black curves. The iteration number indicated on 
the abscissa is the total number of iterations executed but 
only a fraction (typically every tenth or less) were saved 
and used for this analysis. For comparison, the results from 
repeated trials of the RE marginal likelihood estimates ver- 
sus iteration are shown. The dashed curves are RE using 



one component. The thick black curves are RE with 100 
mixture components and the gray curves correspond to 500 
components. It is apparent that two of the PT runs have 
not yet converged. The third appears to have leveled off at 
a value of p{D\M$, I) = 2.8 x 10~ 120 but of course more it- 
erations would be desirable. All of the PT results argue for 
a value significantly less than for the RE method. Finally, 
the result of two RMC trials is even lower at 1.8 x 10 -121 . 
It is a difficult question to decide which estimate is best. As 
a summary we have quoted the PT result with errors that 
span the other two methods. In spite of the large uncer- 
tainty, the evidence favoring the three planet model is very 
strong. Assuming equal model priors, then from the lower 
limit for p(D\M 3 , 1) of 1.8 x 10~ 121 and the upper limit for 
p(D\M 2 ,I) of 3.0 x 10~ 124 , we conclude that for a fair bet 
the odds in favor of M3 over M 2 is ^ 600. Similarly, the 
odds in favor of M3 over Mi is > 6 x 10 6 . 

Table U also gives an estimate for p(D\Mi, I) based on 
four repeats of the ratio estimator with 100 components. Be- 
cause of the large spread in results, our summary is the ge- 
ometric mean of the individual values. In view of the results 
for the three planet model we consider the RE estimates for 
p{D\Mi, I) as an upper limit. 

Column 5 of Table [3] gives the nominal Bayes factor 
comparing each model to the one planet model. Assuming 
equal model priors, the probability of model Mi is given by 

p(D\Mi,I) 



p{Mi\D,I) = 



Y? j= oP{D\Mj,I) 



(14) 



Nominal model probabilities excluding model M4 are given 
in Column 6. The results overwhelmingly favor the three 
planet model. 

Column 7 and 8 list the most probable values of the 
extra noise parameter, s, and the RMS residuals in m s _1 , 
respectively. 



5 DISCUSSION 

In this paper, we have demonstrated that a sophisticated 
Bayesian analysis of the published data for HD 11964 finds 
strong evidence for two additional planets. Is it likely that 
there are many other cases among the 200 published RV 
data sets that this type of an analysis would yield evidence 
for additional planets, or is the HD 11964 system likely to 
be unique or rare? To date, the Bayesian MCMC Kepler 
periodogram has been ru n on only a smal l number of data 
sets incl u ding HP 73526 iGregorvl (|2005bh and HD 208487 
ICregorvl ((2007) , which both yielded evidence for an addi- 
tional planet. It thus appears likely that the algorithm is 
capable of detecting many additional exoplanet cantidates 
in the published RV data. Although the current implemen- 
tation of the algorithm is not particular fast (19h for a typ- 
ical 3 planet model run of 10 6 iterations with 12 tempering 
chains), it has many advantages that were outlined in Sec- 
tion [j] 

One source of error in the measured velocities is jitter, 
which is due in part to flows and inhomogeneities on the 
stellar surface. Wright (2005) gi ves a model that e stimates, 
to within a factor of roug hly 2 l|Butler et al.ll2006h , the jit- 
ter for a star based upon a stars activity, col or, Teff, and 
height above the main sequence. For HD 11964. [Butler et all 
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Table 4. Marginal likelihood estimates, Bayes factors and probabilities for the 5 models. The last two columns list the MAP value of 
extra noise parameter, s, and the RMS residual. 



Model Method 



Mixture 
components 



Marginal 
Likelihood 



Bayes factor 
nominal 



Probability 
nominal 



(m s- 1 ) 



RMS residual 

(m s- 1 ) 



M 



Exact 



6.86 x 10" 



2.7 x 10- 10 2.5 x 10- 



7.6 



8.0 



Mi 
Mi 
Mi 
Mi 

Mm 

Mm 

Mm 

M 2A 

M 2B 

M 2B 

M 2B 

M 2B 

M 2 

M 2 

M 2 

M 2 

M 3 
M 3 
M 3 
M 3 
M 3 
M 3 

M 4 
M 4 
M 4 
M 4 
M 4 



RMC 

RE 

RE 

Summary 

RMC 

RE 

RE 

RE 

RMC 

RE 

RE 

RE 

RE (A+B) 
RMC (A+B) 
PT 

Summary 

RMC 

RE 

RE 

RE 

PT 

Summary 

RE 
RE 
RE 
RE 

Summary 



1 

100 



1 

100 
500 

1 

100 
500 
500 



1 

100 
500 



100 
100 
100 
100 



2.31 ±0.01) x 10~ 128 
2.86 ±0.07) x 10~ 128 
2.50 ±0.06) x 10~ 128 
2.50 ±0.4) x 10~ 12S 

1.5 ±0.2) x 10- 124 

3.02 ±0.16) x 10~ 124 
2.08 ±0.06) x 10- 124 
1.98 ±0.08) x 10~ 124 

3.3 ± 0.2) x 10- 125 
3.86 ±0.26) x 10~ 125 
3.28 ±0.15) x 10~ 125 
2.95 ±0.18) x 10~ 125 
2.3 ± 0.08) x 10- 124 
1.8 ±0.3) x 10~ 124 

oX2 \in-124 
•>xl/2' iU 

2.5 ±0.5) x 10- 124 



1.8 ±0.3) x 10- 121 
14 ±3) x 10" 119 
4.95 ±0.44) x 10- 119 
5.00 ±0.44) x 10- 119 
2.8 x 10- 120 



(2.8 



X18 
xl/16 



) x 10- 

126 
125 



3.2 X 10" 

1.5 x 10~ 
3.7 x 10^ 125 
1.0 x 10- 124 
^ 1.9 x 10- 125 



1.0 



9 x 10~ 9 



4.7 



5.3 



1.0 x 10 4 



9 x 10" 5 



3.3 



4.1 



1.1 x 10 8 



0.99991 



1.9 



3.0 



^ 760 



M 4 excluded 



1.5 



2.5 




1 x10 6 



2x10 6 



3x10 6 
Iterations 



4x10 6 



5x10 6 



Figure 7. The thin solid black curves show 3 repeats of the parallel tempering marginal likelihood method versus iteration number for 
the three planet model. The dashed curves show 6 repeats of the ratio estimator method using only one mixture component. The thick 
black curves shows the result for 4 repeats of the ratio estimator method using 100 mixture components and the gray curves correspond 
to trials using 500 mixture components. 
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(2006) quote a jitter estimate of 5.7m s _1 , based on Wright's 
model. Our models Mo to M4 employ instead an extra Gaus- 
sian noise nuisance parameter, s, with a prior upper bound 
of equal to ifmax = 2129m s _1 . Anything that cannot be 
explained by the model and published measurement uncer- 
tainties (which do not include jitter) contributes to the extra 
noise term. Of course, if we are interested in what the data 
have to say about the size of the extra noise term then we can 
readily compute the marginal posterior for s. The marginal 
for s for M3 is shown in the middle panel of the 6 th row in 
Figure [3] The marginal for s shows a pronounced peak with 
a median of 2.4m s _1 and a MAP value of 2.0m s _1 . The 
MAP value of s for all our models is tabulated in Table [4] 
For Mi, the Map value is 4.9m s _1 which is well within 
the factor of two u ncertainty of the jitter estimate given in 
iButler et all (120061 ) based on Wright's model. The results of 
our Bayesian model selection analysis indicate that a three 
planet model is ^ 6 x 10 6 times more probable than a one 
planet model with the previously estimated jitter. 

It is interesting to compare the performance of the 
three marginal likelihood estimators employed in this work 
to their perfor mance in the two planet fit for HD 208487 
jGregory||2007] ). For HD 208487 the parallel tempering esti- 
mator, based on 34 chains, required ~ 1.5 x 10 6 iterations 
for convergence. The two separate runs agreed within a fac- 
tor of 2.2. The average of the two HD 208487 PT results 
agreed with the RMC and RE (one component) within 20%. 
For the two planet fit of HD 11964 with 40 chains, conver- 
gence required 5 x 10 6 iterations. Since there were two peaks 
in the posterior, the RE and RMC had to be run on each 
peak separately and the two peak contributions added be- 
fore comparing to the PT result which integrates over the 
entire posterior. The single PT run agreed with the RMC 
and RE estimates within a factor of ~ 2. For both HD208487 
and HD 11964, the RMC and RE results for a two planet 
model agreed within ~ 25%. 

For HD 11964, the model M3 results from the three 
methods spanned a much larger range. Further it is clear 
that two of the three PT runs have not converged, in one 
case after 6 x 10 6 iterations with 40 tempering chains which 
took 16 days on a fast single core PC. This experience sug- 
gests that it may not be feasible to compute parallel temper- 
ing marginal likelihoods for models involving three or more 
planets, which will typically require ^ 40 chains. Parallel 
computing could help but there is still a need for more ef- 
ficient, accurate and well calibrated methods for computing 
the marginal likelihoods. MCMC is great for parameter es- 
timation, so perhaps more effort is required to include the 
number of planets as an additional parameter as has been 
done in other areas. 



6 CONCLUSIONS 

In this paper, we further demonstrated the capabilities of 
an automated Bayesian parallel tempering MCMC approach 
to the analysis of precision radial velocities. The method is 
called a Bayesian Kepler periodogram because it is ideally 
suited for detecting signals that are consistent with Kepler's 
laws. However, it is more than a periodogram because it 
also provides full marginal posterior distributions for all the 
orbital parameters that can be extracted from radial velocity 



data. Moreover, it is a very general algorithm that can be 
applied to many other non linear model fittin g problems. 

The HD 11964 data |Butler et al.l 120061 ) has been re- 
analyzed using 1, 2, 3 and 4 planet models. The most prob- 
able model exhibits three periods of 38.02+QQ5, 36014 and 
1924^43 d, and eccentricities of 0.22±£^, 0.63^7 and 
0.051q'q5, respectively. Assuming the three signals (each one 
consistent with a Keplerian orbit) are caused by planets, the 
corresponding limits on planetary mass (Msini) and semi- 
major axis are 

(Q.090+ ilMj, 0.253i°-gg»au), (0.21±° ™Mj, 1.13t°-°», 
(0.771*;°^, 3.46^;i 3 3 au), 

respectively. Based on our three planet model results, the 
remaining unaccounted for stellar jitter is ~ 2m s" 1 . The 
small difference (1.3cr) between the 360 day period and one 
year raise some concern about a possible instrumental ef- 
fect and we suggest that it might be worth investigating the 
barycentric correction for the HD 11964 data. 

Considerable attention was paid to the topic of Bayesian 
model selection. For model fitting involving ^ 2 planets, 
all three marginal likelihood estimators were in good agree- 
ment. For a three planet fit, the RMC and RE results differed 
by a factor of ~ 300 and each differed from the PT result 
by a factor of ~ 17. Further improvements on the model se- 
lection side of this problem are clearly needed, requiring the 
development of more efficient, accurate and well calibrated 
methods for computing the marginal likelihoods. 
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